Recursive Least-Squares Learning with Eligibility Traces

نویسندگان

  • Bruno Scherrer
  • Matthieu Geist
چکیده

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We describe a systematic approach for adapting on-policy learning least squares algorithms of the literature (LSTD [5], LSPE [15], FPKF [7] and GPTD [8]/KTD [10]) to off-policy learning with eligibility traces. This leads to two known algorithms, LSTD(λ)/LSPE(λ) [21] and suggests new extensions of FPKF and GPTD/KTD. We describe their recursive implementation, discuss their convergence properties, and illustrate their behavior experimentally. Overall, our study suggests that the state-of-art LSTD(λ) [21] remains the best least-squares algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recursive Least-Squares Off-policy Learning with Eligibility Traces

In the framework of Markov Decision Processes, we consider the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We review on-policy learning least-squares algorithms of the literature (LSTD (Boyan, 1999), LSPE (Bertsekas and Ioffe, 1996), FPKF (Choi and Van Roy, 2006) and GPTD (Engel, 2005)/KTD (Ge...

متن کامل

Bidding Strategy on Demand Side Using Eligibility Traces Algorithm

Restructuring in the power industry is followed by splitting different parts and creating a competition between purchasing and selling sections. As a consequence, through an active participation in the energy market, the service provider companies and large consumers create a context for overcoming the problems resulted from lack of demand side participation in the market. The most prominent ch...

متن کامل

Off-policy learning with eligibility traces: a survey

In the framework of Markov Decision Processes, we consider linear off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review on-policy learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, ...

متن کامل

Using Sliding Mode Controller and Eligibility Traces for Controlling the Blood Glucose in Diabetic Patients at the Presence of Fault

Some people suffering from diabetes use insulin injection pumps to control the blood glucose level. Sometimes, the fault may occur in the sensor or actuator of these pumps. The main objective of this paper is controlling the blood glucose level at the desired level and fault-tolerant control of these injection pumps. To this end, the eligibility traces algorithm is combined with the sliding mod...

متن کامل

iLSTD: Eligibility Traces and Convergence Analysis

We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the dataefficient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n), where n is the number of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011